Date/Time issue with CFINDEX and SOLR

This post is more than 2 years old.

I'm not sure if this is a bug or totally expected, but as it hit my blog, I figured I'd share it. A reader (thank you Aaron!) noted that searches on my blog were all returning dates in 2000 and 2001:

I noticed that the months and dates were right, it was just the year that was off. I then noticed that most of my posts were in the AM, including some at 1 and 3AM. Now, I'm not that great of a sleeper, but even I need to sleep some time.

I first looked at the code I used to index my data. (By the way, did I mention I switched to SOLR-based searching here? Well, I did. :) This is the code used to query the database and store the results in the index. I only use this when I need to blow away everything and start fresh, but similar code is used for atomic inserts as well.

Note the use of the custom field, posted_dt. This tells cfindex to store data using the dt format, which according to the docs, is...

Note: _dt supports only the date formats supported by ColdFusion.

Since the dates from the database were being used already by ColdFusion in my entry display, I assumed I was ok. Here is an example of one of the dates from my blog:

2011-12-22 10:22:00.0

I then went to the front end and added a dump to my search results. This is where I noticed something odd. Here is what one of my results looked like from cfsearch:

Thu Dec 15 12:30:00 CST 2011

That passes isDate, but if I parseDateTime the string, I get:

{ts '2001-12-15 02:30:00'}

So it appears as if I can pass into cfindex a value that ColdFusion can handle correctly, but SOLR returns something that ColdFusion cannot handle correctly. Luckily I've got no issue just printing exactly what SOLR returned. It doesn't exactly match how I show dates elsewhere, but frankly, I don't care. I could have gotten around this - possibly - by storing the value with the postfix _s instead (ie, simple string), but again, I'm happy just displaying the result as is.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Patrick Heppler posted on 3/12/2013 at 12:10 PM

Maybe it's just a typo but the time in 2011-12-22 10:22:00.0 looks a bit weird

Comment 2 by Raymond Camden posted on 3/12/2013 at 5:47 PM

Afaik it is valid. The .0 at the end is just milliseconds.

Comment 3 by Ty Whalin posted on 6/14/2013 at 2:22 PM

Okay, Q's? I want to ask you about the posted_dt. Are you saying that the _dt at the end of this custom field is what is used to tell the SOLR that it is a datetime field? If yes, where can I find more of those endings for cfindexing control? Looked through a lot of documentation and have not noticed those yet. SOLR was introduced in CF9 correct?

Comment 4 by Raymond Camden posted on 6/14/2013 at 3:13 PM

Yeah but custom fields were CF10, not 9, if I remember correctly.

Comment 5 by Ty Whalin posted on 6/14/2013 at 3:44 PM

I figured that was the case. Will need to do more research for those handy dandy extensions for custom fields. Currently creating a tag and category system along with a search box for searching blog pages by keyword, tag or category. The only real problem I continue to battle; is the fact it keeps wanting to show results for the document pagename.cfm and the document title. I only want it to show the document title that is turned into a link to the blog post. Basically it is producing the same document in the results but with two different sets of links to the same document.

Any suggestions?

Comment 6 by Raymond Camden posted on 6/14/2013 at 3:51 PM

I'm sorry, I don't understand what you are saying. You have complete control over how you display results from cfsearch. I must not be understanding what you mean here.

Comment 7 by Ty Whalin posted on 6/14/2013 at 5:01 PM

Result output:
Title = link
pagename = link

This is the same output for the same page formatted as ahref. Only want the first one to show.

Comment 8 by Raymond Camden posted on 6/14/2013 at 5:18 PM

.... um... again... I'm confused. The result of cfsearch is a query. *You* decide what columns to output when you loop over it.

Comment 9 by Ty Whalin posted on 6/14/2013 at 5:49 PM

Believe the problem to be solved at this point for the linking output. Next, It would appear I need to create more than one cfcollection. One for normal searching and a second for tags and categories. If you follow: http://www.linkworxseo.com/... and then click the first tag named analytics. You will then be shown a result set for the tag and a different result set for the same cfcollection. This is why the search input should be split from the tag results output. Thinking a second cfcollection to split the searches. All tags and categories are not ready yet as I am still developing.

Comment 10 by Raymond Camden posted on 6/14/2013 at 6:39 PM

I'm confused - why do you need a second collection for categories and tags? CF's full text indexing supports categorizing content.

Comment 11 by Ty Whalin posted on 6/14/2013 at 9:44 PM

Yeah, I understand that the categoryTree and category can work, but I think I am torn between running a SQL statement or a cfindex tag to populate the cfsearch for results output.

<cfloop index="i" from="2011" to="#LSDateFormat(now(),'yyyy')#">
<cfindex
categoryTree="tag/"
category="#form.Criteria#"
collection="site-search"
action="update"
type="path"
key="#expandPath('\')#blog\#i#"
custom1="pagepath"
custom2="pagename"
custom3="pagedescription"
extensions="*."
recurse="yes"
language="english"
urlpath="http://www.linkworxseo.com/...">
</cfloop>

This cfindex is going to change my first site-search cfindex on the tag/ page. This cfindex is on the tag/analytics/ page which is a tag. Thinking about using refresh first and then run an update for the action on the tag/ page. What you think?

Comment 12 by Raymond Camden posted on 6/14/2013 at 11:10 PM

Um... no idea what your doing here. :) But yeah - doing a refresh with a query (and it can be a fake query) is better than N atomic index operations.

Comment 13 by Ty Whalin posted on 6/15/2013 at 4:11 PM

Can you fix that link for me? Remove it or remove everything after blog/... Appreciate it.

Comment 14 by Raymond Camden posted on 6/15/2013 at 4:57 PM

Eh? You mean in the code?

Comment 15 by Ty Whalin posted on 6/16/2013 at 1:35 AM

No, just the bad link at urlpath. Think I got it all worked out, But it seems the cfindex is not updating. I ran a refresh and then changed it back to update and the results are only showing for one keyword now (google). Going to give it some time to find out if it will start showing other results now.

Comment 16 by Jim Allison posted on 11/8/2014 at 3:58 AM

It has been a while since this thread/question was posted, but I think I am running into the same issue of I can't DateFormat() the date from a Solr search. My dates from Solr look like this:

Mon Apr 07 10:32:00 MDT 2014

When I do a DateFormat(myDate_dt,"mm/dd/yy") I get:

04/07/01

I really don't want to show all this date information from the Solr search. Any thoughts or solutions?

Thank you

Jim

Comment 17 by Raymond Camden posted on 11/8/2014 at 8:03 PM

You could parse it as a list. List item 2 is the month, 3 is the day, and 6 is the year. You can ignore the timezone. Technically it may matter, like maybe it was 1am and in the TZ you care about it is 11pm, but probably it won't matter.

Note that I no longer use Solr here, I switched to a Google Custom Search Engine.